-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[OCC] change incarnation metric type #433
Open
stevenlanders
wants to merge
36
commits into
occ-main
Choose a base branch
from
change-incarnation-metric-type
base: occ-main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This adds some comments with some useful code pointers for existing logic and discussing future OCC work NA
## Describe your changes and provide context Add multiversion store data structures file, and implement the multiversioned item ## Testing performed to validate your change Added unit tests to verify behavior
## Describe your changes and provide context This adds the incarnation field to the multiversion item data structure. ## Testing performed to validate your change updated unit tests
## Describe your changes and provide context This implements the multiversion with basic functionality, but still needs additional work to implement the iterator functionality and/or persisting readsets for validation ## Testing performed to validate your change Added unit tests for basic multiversion store
## Describe your changes and provide context - `ConcurrencyWorkers` represents the number of workers to use for concurrent transactions - since concurrrency-workers is a baseapp-level setting, implementations (like sei-chain) shouldn't have to pass it (but can) - it defaults to 10 if not set (via cli default value) - it defaults to 10 in app.toml only if that file is being created (and doesn't exist) - if explicitly set to zero on command line, it will override with the default (for safety) - cli takes precedence over the config file - no one has to do anything to get it to be 10 (no config changes no sei-chain changes required (aside from new cosmos version)) ## Testing performed to validate your change - Unit Tests for setting the value - Manually testing scenarios with sei-chain
## Describe your changes and provide context This adds in functionality to write the latest multiversion values to another store (to be used for writing to parent after transaction execution), and also adds in helpers for writeset management such as setting, invalidating, and setting estimated writesets. ## Testing performed to validate your change Unit testing for added functionality
## Describe your changes and provide context - `sei-cosmos` will receive a list of transactions, so that sei-chain does not need to hold the logic for OCC - This will make the logic easier to test, as sei-cosmos will be fairly self-contained - Types can be extended within a tx and within request/response Example interaction: <img src="https://github.com/sei-protocol/sei-cosmos/assets/6051744/58c9a263-7bc6-4ede-83ab-5e34794510b1" width=50% height=50%> ## Testing performed to validate your change - This is a skeleton for a batch interface
## Describe your changes and provide context This implements an mvkv store that will manage access from a transaction execution to the underlying multiversion store and underlying parent store if the multiversion store doesn't have that key. It will first serve any reads from its own writeset and readset, but if it does have to fall through to multiversion store or parent store, it will add those values to the readset. ## Testing performed to validate your change Unit tests
…ore (#330) ## Describe your changes and provide context This adds in validation for transaction state to multiversion store, and implements readset validation for it as well. ## Testing performed to validate your change Unit Test
## Describe your changes and provide context - Adds a basic scheduler shell (see TODOs) - Adds a basic task definition with request/response/index - Listens to abort channel after an execution to determine conflict ## Testing performed to validate your change - Compiles (holding off until shape is validated) - Basic Unit Test for ProcessAll
## Describe your changes and provide context This implements Iterator and ReverseIterator for mvkv for the KVStore interface. The memiterator will be composed of versionindexedstore and multiversionstore, and will yield values in a cascading fashion firstly from the writeset, and then second from the multiversion store. This still needs optimization to persisted sorted keys instead of reconstructing sorted keys each time. ## Testing performed to validate your change Unit test to verify basic functionality
## Describe your changes and provide context This fixes a dependency that was refactored, and enables commit push CI for occ-main ## Testing performed to validate your change CI
## Describe your changes and provide context This implements a tracked iterator that is used to keep track of keys that have been iterated, and to also save metadata about the iteration for LATER validation. The iterator will be replayed and if there are any new keys / any keys missing within the iteration range, it will fail validation. the actual values served by the iterator are covered by readset validation. Additionally, the early stop behavior allows the iterateset to ONLY be sensitive to changes to the keys available WITHIN the iteration range. In the event that we perform iteration, and THEN write a key within the range of iteration, this will not fail iteration because we take a snapshot of the mvkv writeset at the moment of iteration, so when we replay the iterator, we populate that iterator with the writeset at that time, so we appropriately replicate the iterator behavior. In the case that we encounter an ESTIMATE, we have to terminate the iterator validation and mark it as failed because it is impossible to know whether that ESTIMATE represents a value change or a delete, since the latter, will affect the keys available for iteration. This change also implements handlers that iterators receive for updating readset and iterateset in the `mvkv` ## Testing performed to validate your change Unit tests for various iteration scenarios
- This was copied from #332 which became unwieldy due to commit history (merges/rebases) - Adds scheduler logic for validation - In this initial version it completes all executions then performs validations (which feed retries) - Once we start benchmarking we can make performance improvements to this - Retries tasks that fail validation and have no dependencies - Scheduler Test verifies multi-worker with conflicts
## Describe your changes and provide context Some tests from sei-chain don't inject a store, and while I'm not sure if that's a valid scenario I made the scheduler.go tolerant to the situation to avoid introducing this assumption to the system. ## Testing performed to validate your change New unit test confirming lack of crash
## Describe your changes and provide context - Allows sei-chain to ask isOCCEnabled() so that it can choose to use the OCC logic - Sei-chain can set this to true according to desired logic ## Testing performed to validate your change - unit test that sets flag and verifies value
## Describe your changes and provide context This adds in the ability to prefill estimates based on metadata passed along with deliverTxBatch ## Testing performed to validate your change Unit Test to verify that multiversion store initialization is now idempotent, and works properly regardless of whether estimate prefill is enabled
## Describe your changes and provide context - `CollectIteratorItems` needs to hold an RLock to avoid a concurrent access panic ## Testing performed to validate your change - Reproduced through a sei-chain-side test (concurrent instantiates)
## Describe your changes and provide context This adds the accesscontrol module behavior to add the tx writeset generation ## Testing performed to validate your change Unit tests + integration with sei-chain and loadtest cluster testing
## Describe your changes and provide context - Adds trace span for `SchedulerValidate` - Adds trace span for `SchedulerExecute` - Mild refactor (extracted methods) to make it easier to defer span ending ## Testing performed to validate your change Example trace (run locally) ![image](https://github.com/sei-protocol/sei-cosmos/assets/6051744/b8a032f1-71b1-4e95-b12e-357455ebcc6d) Example attributes of SchedulerExecute operation ![image](https://github.com/sei-protocol/sei-cosmos/assets/6051744/68992e84-4000-44c1-8597-9d4c10583a66)
## Describe your changes and provide context This fixes the validation to remove a panic for a case that can actually occur if a transaction writes a key that is later read, and that writing transaction is reverted and then the readset validation reads from parent store. In this case, the readset would have a conflict based on the data available in parent store, so we shouldn't panic. This also adds in the resource types needed for the new DEX_MEM keys ## Testing performed to validate your change Tested in loadtest cluster
This makes optimizations to the scheduler and validation --------- Co-authored-by: Steven Landers <[email protected]>
Add optimizations to reduce mutex lock contention and refactor with sync Maps. This also removes telemetry that was added liberally, and we can later add in telemetry more mindfully and feature flagged. loadtest chain testing
## Describe your changes and provide context - adds pool optimizations (bounds by tasks / workers) - adds validateAll shortcut (starts at first non-validated entry) - adds invalidation of future tasks on invalidation ## Testing performed to validate your change - unit tests are passing with full conflicting txs
## Describe your changes and provide context Update concurrency workers ## Testing performed to validate your change
## Describe your changes and provide context - instead of assuming one thing will arrive to the abort channel, drain it ## Testing performed to validate your change - new unit test captures situation (tests iterator)
## Describe your changes and provide context This change serves to improve the way we track the values of the keys we iterate over when running iterators. Previously, the iterateset would only track the keys that were iterated, and the behavior of the iterator was thought to not include keys that didn't have values present, OR that the readset would be appropriately updated when reading the value from the iterateset. (I'm not yet 100% sure that updating readset WITHIN the tracked iterator is fully necessary, since it may be the case that the readset modifications may have been sufficient to mitigate this issue, but the change is currently in the PR since this is the version of code running on the loadtest cluster for stability testing. However, in cases when an earlier transaction was writing to the range that would be iterated, it was possible that the stale value was read by the transaction handler, BUT the value that got into the readset was the newer one. I believe this has to do with the readset updating based on directly querying values from underlying stores, and overwriting the prior readset value that indicated that the transaction used a stale value. The fix I have made is that during tx execution, the cache memiterator now reads directly form MVKV instead of individually reading from underlying stores. The key difference here is that IF the key is already in the readset, it will serve that STALE value instead of reading into the underlying store where the value may have since mutated. As a result, the behavior we now expect is that one a key is read, ONLY that value that was read will be utilized for the duration of the transaction. This way, we won't potentially mutate the readset by overwriting the key entry with the later value only to have it incorrectly pass validation. Additionally, to more rigorously enforce this behavior, updating the readset now will only update the map IFF the key doesnt already exist in the readset. This should provide better guarantees around catching any stale reads that occur over the lifespan of the transacation execution. ## Testing performed to validate your change Running a lot of iterator heavy workloads on a loadtest cluster to verify that no nondeterminism remains in the iterator workflow
## Describe your changes and provide context ## Testing performed to validate your change
## Describe your changes and provide context This adds `occ-enabled` as a config for baseapp to control whether to execute transactions with OCC parallelism. ## Testing performed to validate your change Tested on sei-chain
## Describe your changes and provide context This removes the block gas meter for occ, and will eventually be rebased out with a corresponding change that will end up in main ## Testing performed to validate your change loadtest cluster testing
## Describe your changes and provide context This reduces some of the locking contention experienced when executing transactions with OCC. Additionally, undoes an earlier revert that reintroduced some locking for event emission ## Testing performed to validate your change
## Describe your changes and provide context ## Testing performed to validate your change --------- Co-authored-by: Yiming Zang <[email protected]>
## Describe your changes and provide context ## Testing performed to validate your change
## Describe your changes and provide context - **retries** represents number of tx attempts beyond the first attempt - **max_incarnation** is the highest incarnation seen in a given block ## Testing performed to validate your change - lower environment
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## occ-main #433 +/- ##
=========================================
Coverage 55.43% 55.43%
=========================================
Files 629 629
Lines 53493 53493
=========================================
Hits 29654 29654
Misses 21729 21729
Partials 2110 2110
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes and provide context
Testing performed to validate your change